The project will examine performance of CPU and GPU chips made since 2000. The data set includes chips made by multiple vendors such as Intel, NVIDIA, ATI, and AMD. Other variables include the process size, thermal design power, die size, number of transistors, frequency, foundry, and GFLOPS. GFLOPS is a way to compare the performance of graphics cards and stands for a billion floating point operations per second.
chips_origional |>slice(1:5) |>select(2:5, 9:11)
# A tibble: 5 × 7
Product Type `Release Date` `Process Size (nm)` `Freq (MHz)` Foundry Vendor
<chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
1 AMD Athl… CPU 2007-02-20 65 2200 Unknown AMD
2 AMD Athl… CPU 2018-09-06 14 3200 Unknown AMD
3 Intel Co… CPU 2020-09-02 10 2600 Intel Intel
4 Intel Xe… CPU 2013-09-01 22 1800 Intel Intel
5 AMD Phen… CPU 2011-05-03 45 3700 Unknown AMD
Feature Exploration
Chip Type
The data set is split roughly even with \(2192\) CPUs and \(2662\) GPUs.
# A tibble: 2 × 2
Type Count
<chr> <int>
1 CPU 2192
2 GPU 2662
Foundries
The data set contains chips made from nine different foundries, with TSMC and Intel making the vast majority of the chips. \(866\) chips didn’t have a foundry listed.
# A tibble: 5 × 2
Vendor Count
<chr> <int>
1 AMD 1662
2 Intel 1392
3 NVIDIA 1201
4 ATI 535
5 Other 64
Transistors
The average number of transistors in a chip is \(1929.922\) million, but the median number is \(624\) million. The range is from \(8\) million up to \(54.2\) billion. The graph shows an extreme right skew.
ggplot(data = chips, aes(x = transistors)) +geom_histogram(fill ="lightblue", color ="darkblue") +theme_minimal() +ylab("Number of Processors") +xlab("Transistors (millions)")
# A tibble: 1 × 13
column n mean sd median trimmed mad min max range skew kurtosis
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 trans… 4143 1930. 4045. 624 1060. 546 8 54200 54192 6.13 61.7
# ℹ 1 more variable: se <dbl>
Frequency
The average processor frequency is \(1484.406\) MHz, and the median frequency is \(1073.5\) MHz. The range is from \(100\) MHz up to \(4700\) MHz. The graph shows a right skew in the data with a peak at around \(500\) MHz.
ggplot(data = chips, aes(x = freq)) +geom_histogram(fill ="lightblue", color ="darkblue") +theme_minimal() +ylab("Number of Processors") +xlab("Frequency (MHz)")
# A tibble: 1 × 13
column n mean sd median trimmed mad min max range skew kurtosis
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 freq 4854 1484. 1067. 1074. 1385. 674. 100 4700 4600 0.646 2.16
# ℹ 1 more variable: se <dbl>
Processor Size
The average processor size is \(55.1\) nm, and the median size is \(40\) nm. The range is from \(0\) nm up to \(250\) nm. The graph shows a right skew in the data with a peak at around \(25\) nm.
ggplot(data = chips, aes(x = process_size)) +geom_histogram(fill ="lightblue", color ="darkblue") +theme_minimal() +ylab("Number of Processors") +xlab("Processor Size (nm)")
# A tibble: 1 × 13
column n mean sd median trimmed mad min max range skew kurtosis
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 proce… 4845 55.1 45.0 40 48.7 25 0 250 250 1.20 3.75
# ℹ 1 more variable: se <dbl>
This project will focus on determining which foundries and vendors
Questions of Interest:
Are we able to predict the foundry/vendor based on GPU statistics?
Are some foundries clearly producing better performing products than others?
Can we see Moore’s Law in the data?
What variables seem to play a role in a high fp32gflop?
Can we create a model to predict GFLOPS?
Can we create a model to predict Vendor?
Is there a Best Foundry?
One potential issue when looking a foundries and performance is that the foundries have partnerships with certain vendors, which may be producing better or worse products. Lets go ahead and look anyway with this in mind. The plot is interactive and shows the vendor when the points are hovered over.
gpu <- chips |>filter(!is.na(fp32gflops))foundry_glops <-ggplot(data = gpu, aes(x = Foundry, y = fp32gflops, label = Vendor)) +geom_jitter(alpha =0.5) +theme_minimal() +ylab("FP32GFLOPS") +labs(title ="Foundry vs FP-32-GFLOPS",caption ="GFLOPS represents Billions of Floating Point Operations Per Second")ggplotly(foundry_glops, tooltip ="label")
It appears as Samsung is a proxy for NVIDIA as it mainly produces NVIDIA GPUs. TSMC produces products for everyone but Intel. I’d say Intel mainly focuses on CPUs and many of the GPUs are just integrated ones. GF also has some high performing chips, which are all AMD chips.
Moore’s Law
Co-founder of Fairchild Semiconductor and Co-founder and CEO of Intel Gordon Moore made the observation that the number of transistors in an integrated circuit doubled about every 2 years. In 1965, Moore predicted a doubling every year for at least a decade, and in 1975, Moore changed his prediction to every two years which has held since then. We can look at the number of transistors over the time period of this data set which has chips since 2000.